# LLM-Auditing-CoIn

This repository contains the implementation for CoIn. The code is organized by stages as follows:

## Directory Overview

### `0_preprocess/`
Scripts for data preprocessing. A subset of data is sampled from HuggingFace datasets.

### `1_mk_data/`
Constructs training and evaluation data for the two matching heads: **Tokens2Block** and **Block2Answer**.

### `2_hash_tree/`
Implements hash tree construction, verification, and profiling.

### `3_Block2Answer/`
Training and evaluation code for the **Block-to-Answer Verification** component.

### `3_Tokens2Block/`
Training and evaluation code for the **Tokens-to-Block Verification** component.

### `4_train_verifier/`
Contains scripts to train the learning-based verifier used in the CoIn pipeline.

### `5_CoIn_pipeline/`
Implements the full CoIn auditing workflow. This is the core component of the project.

### `6_discussion/`
Code used in the discussion section of the paper. Requires local deployment of `Qwen-2.5-70B-Instruct` via vLLM.

### `7_eval_data/`
Evaluation datasets used in the CoIn pipeline. Due to file size constraints, only block size 256 samples from **OpenR1-Math-220k** are provided here, with 10 samples per category. More comprehensive data will be released after paper acceptance.

## Reproducibility

All experiments in this project use a fixed random seed of `42`. We ensure full reproducibility of all reported results.
